Combining Structured and Unstructured Randomness in Large Scale PCA

نویسندگان

  • Nikos Karampatziakis
  • Paul Mineiro
چکیده

Principal Component Analysis (PCA) is a ubiquitous tool with many applications in machine learning including feature construction, subspace embedding, and outlier detection. In this paper, we present an algorithm for computing the top principal components of a dataset with a large number of rows (examples) and columns (features). Our algorithm leverages both structured and unstructured random projections to retain good accuracy while being computationally efficient. We demonstrate the technique on the winning submission the KDD 2010 Cup.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed and Scalable PCA in the Cloud

Principal Component Analysis (PCA) is a popular technique with many applications. Recent randomized PCA algorithms scale to large datasets but face a bottleneck when the number of features is also large. We propose to mitigate this issue using a composition of structured and unstructured randomness within a randomized PCA algorithm. Initial experiments using a large graph dataset from Twitter s...

متن کامل

Techniques for Visualizing 3d Unstructured Meshes

We present a computational module for interactively visualizing, large-scale, 3D un-structured meshes. Scientists and engineers routinely solve large-scale computational boundary value problems on unstructured grids. These grids typically range from several hundred thousand elements to millions of elements. With this ability to solve such large-scale problems comes the challenge of viewing the ...

متن کامل

On Adding Structure to Unstructured Overlay Networks

Unstructured peer-to-peer overlay networks are very resilient to churn and topology changes, while requiring little maintenance cost. Therefore, they are an infrastructure to build highly scalable large-scale services in dynamic networks. Typically, the overlay topology is defined by a peer sampling service that aims at maintaining, in each process, a random partial view of peers in the system....

متن کامل

Structured Sparse Principal Component Analysis

We present an extension of sparse PCA, or sparse dictionary learning, where the sparsity patterns of all dictionary elements are structured and constrained to belong to a prespecified set of shapes. This structured sparse PCA is based on a structured regularization recently introduced by [1]. While classical sparse priors only deal with cardinality, the regularization we use encodes higher-orde...

متن کامل

Techniques for Visualizing D Unstructured Meshes

We present a computational module for interactively visualizing large scale D un structured meshes Scientists and engineers routinely solve large scale computational boundary value problems on unstructured grids These grids typically range from several hundred thousand elements to millions of elements With this ability to solve such large scale problems comes the challenge of viewing the D nite...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1310.6304  شماره 

صفحات  -

تاریخ انتشار 2013